Pitch framing is defined as when a catcher presents a pitch to look like a strike and is most effective when a pitch is outside of the strike zone but is called a strike by the umpire. Pitch framing allows pitchers to get ahead in the count, creating more opportunities for strikeouts and less opportunities for opposing teams to score.
A lot of factors go into whether a pitch is called a strike including, but not limited to, pitch type, count, pitcher handedness, batter side, and so much more. Many catchers have adopted different stances such as catching with a knee down to influence pitch framing. It has been shown in many instances that catcher stance greatly impacts whether or not a pitch is called a strike.
Studies have been conducted at many levels about the effect of catcher stance on pitch framing, but not at Appalachian State. This study will include multiple regression models to predict a pitch being called a strike, bootstrapping to indicate which catcher is the most effective overtime, as well as graphs to show the true effectiveness of different catchers and their stances at Appalachian State. These models and graphs will allow for a comprehensive analysis of pitch framing as well as a player-by-player analysis.
A right knee down stance is defined as when a catcher has their right knee on the ground. Similarly, a left knee down stance is defined as when a catcher’s left knee is on the ground. A primary stance is defined as when neither knee is on the ground.
With the coaching staff’s permission, we recorded the stance of catchers on any given pitch at home games and intrasquad games as the data analytics team did not travel to away games. No catchers were excluded from this study.
By using modeling and exploratory data analysis, I created a comprehensive analysis as well as a player-by-player analysis of pitch framing and catchers at Appalachian State.
To limit the data set, I filtered for only the home games for Appalachian State. Following this, I limited the data to when a catcher’s stance was recorded. Had I used data without a catcher stance, I would not have been able to accomplish my goal of studying catcher stance and its affect on pitch framing. I also filtered for pitches that were either noted as StrikeCalled or BallCalled as balls put in play or swung at by the batter do not influence pitch framing.
Umpires are allotted a two-inch margin for error in calling balls and strikes. Using this margin, I determined that any pitch two inches from the edge of the strike zone should be considered a frameable pitch. A frame-able pitch that was called a strike was noted as Framed. A pitch within more than two inches inside the strike zone should be a strike, so these pitches were noted as strikes. A pitch within the two-inch margin that was called a ball was noted as a ball.
After limiting the data and determining pitches that were framed for strikes, I created two multiple regression models to predict whether a pitch is called a strike. These models included the predictor variables PitcherThrows, PitcherSet, BatterSide, Balls, Strikes, TaggedPitchType, Catcher, CatcherStance, PlateLocSide, and PlateLocHeight. These variables were decided on after talking with other analysts and catching coach Ryan Smoot.
I decided to create two multiple regression models, the first being a random forest model. A random forest model creates multiple decision trees and merges them together to create a very advanced decision tree. A decision tree follows the process, “If this event is true, then the following outcome should happen.” This process also works in the opposite direction meaning a statement can be false and the decision tree predicts outcomes for this scenario.
The next model created was a GLMnet model. This is a logistic regression model that accounts for multicollinearity. This is when two variables have the same affect on what is being predicted. Should a model include a lot of multicollinearity, a model becomes overfit and cannot predict outcomes with new data.
By comparing the models created, I was able to create the most accurate model to predict a pitch being called a strike at Appalachian State.
After creating these predictive models, I wanted to look at what catcher is most effective. This was done through the use of bootstrapping. By finding the proportion of pitches that were framed as strikes, I generated 50 seasons worth of data for each catcher to determine which catcher is the best at framing.
While bootstrapping made the project interesting, bootstrapping is not all real data. It is only based off of real data. To display how effective each catcher is at framing, I created scatterplots to indicate pitches that were framed as strikes. These scatterplots indicate the location of the pitch, handedness of the pitcher, handedness of the batter, pitch type, and catcher stance. The details of each pitch can be found by hovering your mouse over each point.
Proportion tables were created to show how effective each catcher is by stance, pitch type, and even pitcher set. These tables were also created for the team as a whole.
Due to data only being collected for one season, hypothesis tests cannot be conducted to determine if these proportional differences are statistically significant.
After creating the randomforest and GLMnet models, I calculated their \(RMSE\), or root mean squared error. This statistic represents the amount of error per model. Accuracy was also calculated for both models. As seen in the graphs below, the randomforest model is far more accurate in predicting a pitch being called a strike, with an accuracy of over 94%.
To accomplish bootstrapping, I needed to create tables for each individual catcher. After creating tables for each catcher, I decided to look at the proportion of pitches that were framed for strikes. Comparing catchers at a proportional level is important because it appears less biased. The data is biased as catchers saw different amounts of pitches due to injuries, skill level, and roster changes made by the coaching staff. By making pitch framing proportional it is easier to compare catchers.
I generated data and proportions of pitches that were framed as strikes over 50 seasons. While no player will be able to play for 50 seasons, I chose this number because it is large enough for a large distribution without being overbearing.
Below are the graphs of the bootstrap distributions. The shaded area is a confidence interval, meaning the computer is confident that 95% of the time the proportion of framed pitches will fall within the boundaries of the shaded region.
The creation of these graphs however means nothing without summary statistics. Should these graphs not be normally distributed they cannot be accepted, and outliers need to be taken into account. For this reason, each bootstrap distribution’s skewness and kurtosis were calculated. These results can be seen below. Skewness represents the normality of a distribution. Should the skewness be 0 a distribution is considered perfectly normal. Kurtosis represents a distributions outliers. The higher the kurtosis level, the more outliers there are.
## # A tibble: 1 × 2
## skew kurt
## <dbl> <dbl>
## 1 -0.0962 2.53
## # A tibble: 1 × 2
## skew kurt
## <dbl> <dbl>
## 1 0.988 5.72
## # A tibble: 1 × 2
## skew kurt
## <dbl> <dbl>
## 1 0.0407 2.86
## # A tibble: 1 × 2
## skew kurt
## <dbl> <dbl>
## 1 -0.133 2.72
## # A tibble: 1 × 2
## skew kurt
## <dbl> <dbl>
## 1 -0.520 4.65
It is also important to look at the real data and how each catcher performed. By identifying pitches that should be framed (two inches from the edge of the strike zone), scatterplots can be created for each catcher to show the pitches that they framed as strikes. Hovering over each point with your mouse presents an informational box for each pitch. The color of each dot represents catcher stance. R corresponds to right knee down, L to left knee down, and P to primary.
Below each player’s graphs are tables that tell the proportion of pitches they framed as strikes by stance, pitch type, and pitcher set. It is important for the reader to look at the tables carefully, especially at the tables that talk about pitch type as they will not be discussed in this report.
## # A tibble: 3 × 3
## RKD P LKD
## <chr> <chr> <chr>
## 1 Inf Inf 0.72
## 2 Inf 0.66 Inf
## 3 0.64 Inf Inf
## # A tibble: 21 × 6
## # Groups: Catcher, CatcherStance [3]
## Catcher CatcherStance TaggedPitchType n FramedPitches prop
## <chr> <chr> <chr> <int> <int> <chr>
## 1 Cross, Hayden L ChangeUp 12 8 0.67
## 2 Cross, Hayden L Curveball 5 4 0.80
## 3 Cross, Hayden L Cutter 3 1 0.33
## 4 Cross, Hayden L Fastball 37 29 0.78
## 5 Cross, Hayden L Other 1 1 1.00
## 6 Cross, Hayden L Sinker 1 1 1.00
## 7 Cross, Hayden L Slider 5 2 0.40
## 8 Cross, Hayden P ChangeUp 35 21 0.60
## 9 Cross, Hayden P Curveball 31 16 0.52
## 10 Cross, Hayden P Cutter 4 0 0.00
## # … with 11 more rows
## # A tibble: 6 × 6
## # Groups: CatcherStance, PitcherSet [6]
## CatcherStance PitcherSet Catcher n FramedPitches prop
## <chr> <chr> <chr> <int> <int> <chr>
## 1 L Stretch Cross, Hayden 17 10 0.59
## 2 L Windup Cross, Hayden 47 36 0.77
## 3 P Stretch Cross, Hayden 212 144 0.68
## 4 P Windup Cross, Hayden 84 51 0.61
## 5 R Stretch Cross, Hayden 250 157 0.63
## 6 R Windup Cross, Hayden 312 201 0.64
Interestingly, Cross is least effective in his most used stance, a right knee down. In this stance and his primary stance Cross catches every type of pitch and in all counts. Cross is most effective from his primary stance. Strangely, his second most effective stance is his least used stance, a left knee down. This result could be due to the lack of data regarding his left knee down stance as he rarely uses it, but he is very successful with it and uses it mostly on breaking balls.
Cross is most effective when using a a left knee down stance and when the pitcher is in the windup. His next most effective stance while the pitcher is in the windup is his right knee down. When the pitcher comes set Cross is most effective in the primary stance.
## # A tibble: 3 × 3
## R_prop P_prop L_prop
## <dbl> <dbl> <dbl>
## 1 Inf Inf 0.529
## 2 Inf 0.682 Inf
## 3 0.762 Inf Inf
## # A tibble: 21 × 6
## # Groups: Catcher, CatcherStance [3]
## Catcher CatcherStance TaggedPitchType n FramedPitches prop
## <chr> <chr> <chr> <int> <int> <chr>
## 1 Arnold, Carson L ChangeUp 9 2 0.22
## 2 Arnold, Carson L Curveball 6 5 0.83
## 3 Arnold, Carson L Cutter 4 4 1.00
## 4 Arnold, Carson L Fastball 37 17 0.46
## 5 Arnold, Carson L Other 1 1 1.00
## 6 Arnold, Carson L Slider 11 7 0.64
## 7 Arnold, Carson L Splitter 2 1 0.50
## 8 Arnold, Carson P ChangeUp 61 45 0.74
## 9 Arnold, Carson P Curveball 29 18 0.62
## 10 Arnold, Carson P Cutter 5 3 0.60
## # … with 11 more rows
## # A tibble: 6 × 6
## # Groups: CatcherStance, PitcherSet [6]
## CatcherStance PitcherSet Catcher n FramedPitches prop
## <chr> <chr> <chr> <int> <int> <chr>
## 1 L Stretch Arnold, Carson 18 8 0.44
## 2 L Windup Arnold, Carson 52 29 0.56
## 3 P Stretch Arnold, Carson 296 211 0.71
## 4 P Windup Arnold, Carson 116 70 0.60
## 5 R Stretch Arnold, Carson 9 7 0.78
## 6 R Windup Arnold, Carson 54 41 0.76
Arnold uses all stances for all kinds of pitches and in all counts. He is most effective using his right knee down stance, followed by his primary, and left knee down stance.
Arnold is most very effective with his right knee down when the pitcher is in the windup and when he comes set. He is also very effective in his primary stance when the pitcher is in the stretch followed by the windup. He is least effective when in a left knee down stance.
## # A tibble: 3 × 3
## R_prop P_prop L_prop
## <dbl> <dbl> <dbl>
## 1 Inf Inf 0.786
## 2 Inf 0.616 Inf
## 3 0.787 Inf Inf
## # A tibble: 15 × 6
## # Groups: Catcher, CatcherStance [3]
## Catcher CatcherStance TaggedPitchType n FramedPitches prop
## <chr> <chr> <chr> <int> <int> <chr>
## 1 Lipson, Jack L ChangeUp 6 5 0.83
## 2 Lipson, Jack L Curveball 7 4 0.57
## 3 Lipson, Jack L Fastball 23 19 0.83
## 4 Lipson, Jack L Slider 6 5 0.83
## 5 Lipson, Jack P ChangeUp 15 9 0.60
## 6 Lipson, Jack P Curveball 8 4 0.50
## 7 Lipson, Jack P Fastball 69 42 0.61
## 8 Lipson, Jack P Slider 17 13 0.76
## 9 Lipson, Jack P Splitter 3 1 0.33
## 10 Lipson, Jack R ChangeUp 8 7 0.88
## 11 Lipson, Jack R Curveball 5 4 0.80
## 12 Lipson, Jack R Fastball 32 26 0.81
## 13 Lipson, Jack R Knuckleball 1 1 1.00
## 14 Lipson, Jack R Slider 13 8 0.62
## 15 Lipson, Jack R Splitter 2 2 1.00
## # A tibble: 6 × 6
## # Groups: CatcherStance, PitcherSet [6]
## CatcherStance PitcherSet Catcher n FramedPitches prop
## <chr> <chr> <chr> <int> <int> <chr>
## 1 L Stretch Lipson, Jack 21 18 0.86
## 2 L Windup Lipson, Jack 21 15 0.71
## 3 P Stretch Lipson, Jack 74 52 0.70
## 4 P Windup Lipson, Jack 38 17 0.45
## 5 R Stretch Lipson, Jack 18 11 0.61
## 6 R Windup Lipson, Jack 43 37 0.86
Lipson is also very versatile in his stances and he is able to use all of them at any time. Interestingly, his most effective stances are his right knee down, followed by his left knee down, and then his primary stance.
When the pitcher is in the windup, Lipson is very effective in a right knee down stance, followed by left knee, and then primary. When the pitcher is in the stretch, Lipson is effective with a left knee down stance followed by primary, and right knee down stance.
## # A tibble: 3 × 3
## R_prop P_prop L_prop
## <dbl> <dbl> <dbl>
## 1 Inf Inf 1
## 2 Inf 0.833 Inf
## 3 0.805 Inf Inf
## # A tibble: 12 × 6
## # Groups: Catcher, CatcherStance [3]
## Catcher CatcherStance TaggedPitchType n FramedPitches prop
## <chr> <chr> <chr> <int> <int> <chr>
## 1 Lewis, Trent L Other 1 1 1.00
## 2 Lewis, Trent P ChangeUp 2 1 0.50
## 3 Lewis, Trent P Curveball 1 1 1.00
## 4 Lewis, Trent P Cutter 1 1 1.00
## 5 Lewis, Trent P Fastball 5 5 1.00
## 6 Lewis, Trent P Slider 3 2 0.67
## 7 Lewis, Trent R ChangeUp 2 1 0.50
## 8 Lewis, Trent R Curveball 4 3 0.75
## 9 Lewis, Trent R Cutter 2 2 1.00
## 10 Lewis, Trent R Fastball 24 20 0.83
## 11 Lewis, Trent R Slider 8 6 0.75
## 12 Lewis, Trent R Splitter 1 1 1.00
## # A tibble: 5 × 6
## # Groups: CatcherStance, PitcherSet [5]
## CatcherStance PitcherSet Catcher n FramedPitches prop
## <chr> <chr> <chr> <int> <int> <chr>
## 1 L Windup Lewis, Trent 1 1 1.00
## 2 P Stretch Lewis, Trent 9 8 0.89
## 3 P Windup Lewis, Trent 3 2 0.67
## 4 R Stretch Lewis, Trent 13 12 0.92
## 5 R Windup Lewis, Trent 28 21 0.75
Lewis tends to use his right knee down stance, and is incredibly effective using it. However, he is perfect using his left knee down stance. His effectiveness is still high using a primary stance as well.
Lewis is most effective in a left knee down stance followed by his right knee down and primary stances when a pitcher is in the windup. When a pitcher comes set, Lewis is most effective using a right knee down stance and then a primary stance. However he does not have much data, so these results may be flawed do to that.
## # A tibble: 3 × 3
## R_prop P_prop L_prop
## <dbl> <dbl> <dbl>
## 1 Inf Inf 0.579
## 2 Inf 0.743 Inf
## 3 0.471 Inf Inf
## # A tibble: 11 × 6
## # Groups: Catcher, CatcherStance [3]
## Catcher CatcherStance TaggedPitchType n FramedPitches prop
## <chr> <chr> <chr> <int> <int> <chr>
## 1 Yakubinis, JD L ChangeUp 3 3 1.00
## 2 Yakubinis, JD L Curveball 1 0 0.00
## 3 Yakubinis, JD L Fastball 13 7 0.54
## 4 Yakubinis, JD L Slider 2 1 0.50
## 5 Yakubinis, JD P ChangeUp 7 5 0.71
## 6 Yakubinis, JD P Curveball 2 2 1.00
## 7 Yakubinis, JD P Fastball 22 16 0.73
## 8 Yakubinis, JD P Slider 4 3 0.75
## 9 Yakubinis, JD R Curveball 1 0 0.00
## 10 Yakubinis, JD R Fastball 11 5 0.45
## 11 Yakubinis, JD R Slider 5 3 0.60
## # A tibble: 6 × 6
## # Groups: CatcherStance, PitcherSet [6]
## CatcherStance PitcherSet Catcher n FramedPitches prop
## <chr> <chr> <chr> <int> <int> <chr>
## 1 L Stretch Yakubinis, JD 3 2 0.67
## 2 L Windup Yakubinis, JD 16 9 0.56
## 3 P Stretch Yakubinis, JD 24 15 0.62
## 4 P Windup Yakubinis, JD 11 11 1.00
## 5 R Stretch Yakubinis, JD 10 4 0.40
## 6 R Windup Yakubinis, JD 7 4 0.57
Yakubinis typically uses a primary stance and is most effective in that stance, followed by his left knee down, and then his right knee down.
Yakubinis is most effective when in a primary stance followed by his right knee down and left knee down when a pitcher is in the windup. When the pitcher is in the stretch, Yakubinis is most effective with a left knee down, followed by primary and right knee down stances.
The table below shows the proportion of pitches that were framed as strikes for the team by catcher stance. It can be seen that primary was the most effective, followed by right knee down and left knee down.
## # A tibble: 3 × 4
## CatcherStance n FramedPitches prop
## <chr> <int> <int> <dbl>
## 1 L 196 128 0.653
## 2 P 867 581 0.670
## 3 R 744 495 0.665
This table shows how pitch type effected framing by stance for the team. These tables are lengthy and have a lot of variation. It is pertinent that the reader look over this table to understand it rather than read an explanation in this report.
## # A tibble: 25 × 5
## # Groups: CatcherStance [3]
## CatcherStance TaggedPitchType n FramedPitches prop
## <chr> <chr> <int> <int> <chr>
## 1 L ChangeUp 30 18 0.60
## 2 L Curveball 19 13 0.68
## 3 L Cutter 7 5 0.71
## 4 L Fastball 110 72 0.65
## 5 L Other 3 3 1.00
## 6 L Sinker 1 1 1.00
## 7 L Slider 24 15 0.62
## 8 L Splitter 2 1 0.50
## 9 P ChangeUp 120 81 0.68
## 10 P Curveball 71 41 0.58
## # … with 15 more rows
The table below shows the proportion of balls framed as strikes based on catcher stance and pitcher set for the team. It can be seen that the team frames much better from the windup when in a right knee down stance, better in a left knee down stance when in the windup, and remarkably better in the primary stance as the pitcher comes set.
## # A tibble: 6 × 5
## # Groups: CatcherStance [3]
## CatcherStance PitcherSet n FramedPitches prop
## <chr> <chr> <int> <int> <dbl>
## 1 L Stretch 59 38 0.644
## 2 L Windup 137 90 0.657
## 3 P Stretch 615 430 0.699
## 4 P Windup 252 151 0.599
## 5 R Stretch 300 191 0.637
## 6 R Windup 444 304 0.685
The randomforest model is over 94% accurate in predicting pitches! Given new data, this model can predict whether a pitch will be called a strike or not. Considering that data was only collected for one semester and the strike zone changes based on umpire, this model can be very useful. To be more accurate, I hypothesize that additionally using umpire as a predictors will make the model become even more accurate. This is incredibly important because coaches can determine which pitcher/catcher combination is best suited to play, and plan pitch sequencing more effectively.
I found the bootstrapping to be incredibly interesting. All bootstrap distributions were approximately normal as their skew statistic was between -.5 and .5 except for Arnold and Yakubinis. This means neither of their bootstrap distributions are reliable to predict what they will do in the future. Interestingly, Jack Lipson who was out for most of the year with an injury has the most normal distribution, and the fewest number of outliers indicated by the kurt statistic. This means that Lipson is estimated to frame below 30% of pitches for strikes. Every other catcher is estimated to frame above or around 30% of pitches.
These findings could be due to the lack of data as I stated before that Lipson was out for most of the year with an injury, and neither Lewis or Yakubinis caught many pitches. Cross who saw most of the pitches this season is expected to frame approximately 33.75% of balls as strikes as most of his distribution is directly between .325 and .350.
Comparing catching stances as a whole can be misleading given that Cross primarily used a right knee down stance and he caught most of the pitches this season. This is why I decided to compare Arnold and Lipson together. Both catchers use a variety of stances and use them at any time. Arnold’s most effective stance was his right knee down stance, followed by his primary, and left knee down. Lipson is most effective using a right knee down stance followed by a left knee down, and then a primary stance. Considering both catchers use a variety of stances, it can be hypothesized that umpires prefer a right knee down stance.
Overall, catchers at Appalachian State framed more pitches when in the primary stance. However, when a pitcher was in the windup, catchers performed better in a stance with a knee down, specifically the right knee, and were more effective in the primary stance when pitchers threw from the stretch. I believe it to be necessary to continue collecting data to see of these trends continue.
Coaches can use new data to predict a pitch being called a ball or strike. Using these predictions, coaches can determine what catching stance is best for each pitch type, pitcher set, and count. These predictions can allow coaches to make more statistical decisions in who to play as well as recruit. College baseball is a constant cycle of in and out, as it should be due to graduation. Identifying catchers that compliment pitching is incredibly important to create long term success, and the randomforest model can help with that. I believe this model can only be improved with the additional predictor of who the umpire is as all strike zones vary based on umpire.
Cross is Appalachian State’s most effective pitch framer and is expected to be over time. This is seen by his bootstrap distribution. These findings may be due to the lack of data. Arnold was almost just as effective and he saw just over half of the pitches as Cross. Lipson, Lewis, and Yakubinis saw less than a fifth of the pitches that Cross saw, making it harder to make predictions about them. I am very interested in collecting more data to see how these results change.
Catcher’s tend to be habitual in the fact that they use the stance they are likely most comfortable with. This stance also tends to be their most effective stance with the exception of Cross. Some of these results could be due to the lack of data and the necessity to collect more, but this report is a good starting point for catchers to improve upon. It is also important to note that catchers framed more pitches with a knee down when a pitcher was in the windup.